Identification of conserved Drosophila-specific euchromatin-restricted non-coding sequence motifs.

نویسندگان

  • Chol-Hee Jung
  • Igor V Makunin
  • John S Mattick
چکیده

Non-protein-coding DNA comprises the majority of animal genomes but its functions are largely unknown. We identified over 17,000 different tetranucleotide pairs in the Drosophila melanogaster genome that are over-represented at distances up to 100nt in conserved non-exonic sequences. Those exhibiting the highest information content in surrounding nucleotides were classified into five groups: tRNAs, motifs associated with histone genes, Suppressor-of-Hairy-wing binding sites, and two sets of previously unrecognized motifs (DLM3 and DLM4). There are hundreds to thousands of copies of DLM3 and DLM4, respectively, in the genome, located almost exclusively in non-coding regions. They have similar copy numbers among drosophilids, but are largely absent in other insects. DLM3 is likely a cis-regulatory element, whereas DLM4 sequences are capable of forming a short hairpin structure and are expressed as approximately 80nt RNAs. This work reports the existence of Drosophila genus-specific sequence motifs, and suggests that many more novel functional elements may be discovered in genomes using the general approach outlined herein.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura

MOTIVATION To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genom...

متن کامل

Interspecific comparison of the period gene of Drosophila reveals large blocks of non-conserved coding DNA.

We have cloned and sequenced the coding region of the period (per) gene from Drosophila pseudoobscura and D. virilis. A comparison with that of D. melanogaster reveals that the conceptual translation products consist of interspersed blocks of conserved and non-conserved amino acid sequence. The non-conserved portion, comprising approximately 33% of the protein sequence, includes the perfect Thr...

متن کامل

DrosOCB: a high resolution map of conserved non coding sequences in Drosophila

Comparative genomics methods are widely used to aid the functional annotation of non coding DNA regions. However, aligning non coding sequences requires new algorithms and strategies, in order to take into account extensive rearrangements and turnover during evolution. Here we present a novel large scale alignment strategy which aims at drawing a precise map of conserved non coding regions betw...

متن کامل

Big Genomes Facilitate the Comparative Identification of Regulatory Elements

The identification of regulatory sequences in animal genomes remains a significant challenge. Comparative genomic methods that use patterns of evolutionary conservation to identify non-coding sequences with regulatory function have yielded many new vertebrate enhancers. However, these methods have not contributed significantly to the identification of regulatory sequences in sequenced invertebr...

متن کامل

ITS1, 5.8S and ITS2 secondary structure modelling for intra-specific differentiation among species of the Colletotrichum gloeosporioides sensu lato species complex

The Colletotrichum gloeosporioides species complex is among the most destructive fungal plant pathogens in the world, however, identification of member species which are of quarantine importance is impacted by a number of factors that negatively affect species identification. Structural information of the rRNA marker may be considered to be a conserved marker which can be used as supplementary ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genomics

دوره 96 3  شماره 

صفحات  -

تاریخ انتشار 2010